---
title: "Replication Paper"
author: "Haruka Braun, Lucy Ding, and Camryn Jones"
date: '\today'
fontsize: 12pt
output:
  pdf_document: default
urlcolor: red
header-includes:
  - \usepackage{bm}
---

# Paper Outline

## Introduction

In the paper entitled 'The COVID-19 Pandemic and the 2020 US Presidential Election' (2021) by Leonardo Baccini, Abel Brodeur, and Stephen Weymoth published in the Journal of Population Economics, the authors demonstrate that rising COVID-19 cases ultimately had a negative effect on the vote-share that went to Trump in the 2020 presidential election cycle. They use estimates from OLS and 2SLS models, to ultimately perform a counterfactual analysis to show what the vote-share for Trump would have looked like in some of the swing states if cases were 5%, 10%, and 20% lower.
We were interested in investigating whether such trends were true for the 2020 House elections, and used 2018 data to compare for the treatment of the pandemic.

## Replication

We replicated Figures 1 and 2, as well as Table 1, 2 and 3. Figure 1 maps the cumulative COVID-19 cases per 10,000. Figure 2 maps the changes in share of votes for Donald Trump from 2016 to 2020. Table 1 represents the descriptive statistics of the data. Table 2 showcases the impact of COVID-19 cases with OLS estimates. Finally, Table 3 shows the counterfactual outcomes in closely contested states won by Biden.

## Discrepancies

When we commenced investigation for  the replication process, we ran into some challenges due to discrepancies in the methodology of the paper. When replicating table 1 (summary statistics), n = 2586 in the original paper, while n = 3110 in our replication. After we dropped what seemed to be everything dropped in the stata code, we arrived at n = 2700. It is still unclear what other inputs were dropped.
The confidence intervals used in Table 2 at the 90% confidence level are not statistically significant, which means that the counterfactuals in Table 3 are not necessarily accurate for the full range. 
Additionally, the replication code for Table 3 states: “Done manually in excel” and provides no further information outside of what is written in the paper. 

## Extension

To understand whether the model used to investigate changes in Trump vote share in swing states changed if COVID-19 cases were 5%, 10%, and 20% lower, we used the same models from the paper but applied them to data representing change in vote-share for the Congressional House elections from 2018 to 2020. We found that unlike the conclusion of the main paper, a rise in COVID cases would have led to increased chances of Republican wins in seats in the House. This challenges the partisan narrative that the authors lead the paper with.

## Conclusion

While the original paper claims that changes in COVID cases may have affected vote-share for Trump in swing states, this trend was not true when applied to data of change in vote share for Republican seats in Congressional House elections from 2018 to 2020.

# Set Up and Data Cleaning
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r, warning = FALSE, message = FALSE}
# Load necessary packages
library(usmap)
library(haven)
library(tidyverse)
library(dplyr)
library(sf)
library(ggplot2)
library(janitor)
library(sandwich)
library(lmtest)
library(AER)
library(stargazer)
library(lfe)
library(estimatr)
library(stats)
library(MatchIt)
library(optmatch)
library(tableone)
library(cobalt)
library(CBPS)
library(gt)
library(tigris)
library(gtsummary)
library(psych)
library(fixest)
options(scipen = 100)
select <- dplyr::select
```

```{r, warning = FALSE, message = FALSE}
# Read in paper data and election data
Data <- read_dta("repli_main_tables.dta") %>%
  clean_names()
Election_2020 <- read_csv("2020_0_0_2.csv") %>%
  clean_names()
Election_2016 <- read_csv("county_2016.csv") %>%
  clean_names()

# Merge paper data with election data
Final <- merge(Data, Election_2020, by = "fips")
Final <- merge(Final, Election_2016, by = "fips")

# Load fips data and merge with final data
data(fips_codes)
fips_codes <- fips_codes %>%
  mutate(fips = paste(state_code, county_code, sep = "")) %>%
  mutate(fips = as.numeric(fips))
Final <- left_join(Final, fips_codes, by = "fips")

# Clean final data
Final <- Final %>%
  clean_names() %>%
  mutate(
    total_vote_20 = as.numeric(total_vote_x),
    trump_20 = as.numeric(donald_j_trump_x),
    total_vote_16 = as.numeric(total_vote_y),
    trump_16 = as.numeric(donald_j_trump_y),
    clinton_16 = as.numeric(hillary_clinton),
    biden = as.numeric(joseph_r_biden_jr)
  ) %>%
  mutate(state_code_num = as.numeric(state_code)) %>%
  filter(state_code_num < 60) %>%
  mutate(
    trump_voting_2020 = (trump_20 / total_vote_20) * 100,
    changes_trump = (trump_20 / total_vote_20 - trump_16 / total_vote_16) * 100,
    changes_total = total_vote_20 - total_vote_16,
    covid_cases = (population / 10000) * cases_10000,
    covid_deaths = (population / 100000) * deaths_100000,
    rep_share_2020 = (trump_20 / total_vote_20) * 100,
    rep_share_2016 = (trump_16 / total_vote_16) * 100,
    depvar = rep_share_2020 - rep_share_2016,
    covid_cases = (population / 10000) * cases_10000,
    cases_10000_2 = cases_10000 * cases_10000,
    trump_gap = trump_20 - biden
  )
```

# Map 1

```{r, message = FALSE}
# Plot cumulative COVID-19 cases per 10,000 on US map
plot_usmap(
  data = Final, values = "cases_oct_10000", exclude = c("AK", "HI"),
  size = 0.1
) +
  scale_fill_continuous() +
  scale_fill_gradient2(low = "white", high = "blue1") +
  theme(legend.position = "left") +
  labs(title = "Fig 1. Cumulative COVID-19 cases per 10,000")
```

# Map 2

```{r}
# Plot changes in share of votes for Donald Trump from 2016 to 2018 on US map
plot_usmap(
  data = Final, values = "depvar", exclude = c("AK", "HI"),
  size = 0.1
) +
  scale_fill_continuous() +
  scale_fill_gradient2(low = "white", high = "blue1") +
  theme(legend.position = "left") +
  labs(
    title =
      "Fig 2. Changes in share of votes for Donald Trump from 2016 to 2020"
  )
```

# Table 1

```{r}
# Drop missing values
tab1df <- Final %>%
  drop_na(trump_voting_2020, workplaces_april, s_emp_s2018) %>%
  select(
    trump_voting_2020, changes_trump, changes_total, covid_cases,
    cases_10000, covid_deaths, deaths_100000, i_animal_processing,
    unemp_change_covid1
  )
```

```{r}
# Create summary table of election outcomes, COVID-19, and labor outcomes
describe(tab1df, fast = TRUE, skew = FALSE) %>%
  select(mean, sd, max, min, n) %>%
  gt(rownames_to_stub = TRUE) %>%
  tab_header(title = "Table 1") %>%
  tab_row_group(
    label = "Labor outcomes",
    rows = c("i_animal_processing", "unemp_change_covid1")
  ) %>%
  tab_row_group(
    label = "COVID-19 incidence",
    rows = c("covid_cases", "cases_10000", "covid_deaths", "deaths_100000")
  ) %>%
  tab_row_group(
    label = "Election outcomes",
    rows = c("trump_voting_2020", "changes_trump", "changes_total")
  ) %>%
  cols_label(
    mean = "Mean",
    sd = "S.D.",
    max = "Max",
    min = "Min"
  ) %>%
  fmt_number(
    columns = 1:5,
    decimals = 1
  )
```


# Table 2
```{r}
# Drop missing values
Final_tab2 <- Final %>%
  drop_na(workplaces_april, s_emp_s2018)

# Column 1 replication - OLS fixed-effects with demographic and socioeconomic
# controls
model_c1 <- felm(data = Final_tab2, depvar ~ cases_10000 + black2018 +
  white_nonhispanic2018 + college2018 + foreignborn2018 +
  female2018 + totalpop2018 + age_20_242010 + age_25_342010 +
  age_35_442010 + age_45_542010 + age_55_592010 +
  age_60_642010 + age_65_742010 + age_75_842010 +
  age_85over2010 + exposure + proximity + remindex +
  crit_work | istate | 0 | istate)

# Column 2 replication - OLS fixed-effects with demographic, socioeconomic,
# and social mobility controls
model_c2 <- felm(data = Final_tab2, depvar ~ cases_10000 + black2018 +
  white_nonhispanic2018 + college2018 + foreignborn2018 +
  female2018 + totalpop2018 + age_20_242010 + age_25_342010 +
  age_35_442010 + age_45_542010 + age_55_592010 +
  age_60_642010 + age_65_742010 + age_75_842010 +
  age_85over2010 + exposure + proximity + remindex +
  crit_work + workplaces_april | istate | 0 | istate)

# Column 3 replication - OLS fixed-effects with demographic, socioeconomic,
# social mobility, and unemployment controls
model_c3 <- felm(data = Final_tab2, depvar ~ cases_10000 + black2018 +
  white_nonhispanic2018 + college2018 + foreignborn2018 +
  female2018 + totalpop2018 + age_20_242010 + age_25_342010 +
  age_35_442010 + age_45_542010 + age_55_592010 +
  age_60_642010 + age_65_742010 + age_75_842010 +
  age_85over2010 + exposure + proximity + remindex +
  crit_work + workplaces_april + unemp_change_covid1 |
  istate | 0 | istate)
```

```{r, warning = FALSE, message = FALSE}
# Table 2 replication with standard errors
stargazer(model_c1, model_c2, model_c3, type = "text", digits = 4, title = 'Table 2 Replication with Standard Errors')
```

```{r, warning = FALSE, message = FALSE}
# Table 2 replication with confidence intervals
stargazer(model_c1, model_c2, model_c3, type = "text", digits = 5, 
          omit.stat = "ser", ci = TRUE, ci.level = 0.90, title = 'Table 2 Replication with 90% Confidence Intervals', align =  TRUE)
```


# Table 3

```{r}
# Counterfactuals with 90% confidence intervals
c1 <- confint(model_c1, level = 0.90)
Final %>%
  mutate(paper = -0.0012 * cases_10000 * 0.05,
         c_upper = c1[1] * cases_10000 * 0.05,
         c_lower = c1[1,2] * cases_10000 * 0.05,
         additional = paper * total_vote_20,
         additional_upper = c_upper * total_vote_20,
         additional_lower = c_lower * total_vote_20) %>%
  group_by(state) %>%
  summarize(trump_gap = sum(trump_gap),
            additional = sum(additional),
            upper_bound = sum(additional_upper),
            lower_bound = sum(additional_lower)
            ) %>%
  filter(state %in% c("AZ", "GA", "MI", "PA", "WI")) %>%
  gt()
```

# Extension

```{r, warning = FALSE, message = FALSE}
# Read in county level data for the US House of Representatives
county_house_2020 <- read_csv("county_house_2020.csv") %>%
  clean_names()
county_house_2018 <- read_csv("county_house_2018.csv") %>%
  clean_names()

# Join with control variables and clean data
county_house <-
  left_join(county_house_2020, county_house_2018,
    by = c("fips", "geographic_name", "geographic_subtype")
  ) %>%
  slice(-c(1)) %>%
  mutate(
    fips = as.numeric(fips),
    total_vote_20 = as.numeric(total_vote.x),
    democratic_20 = as.numeric(democratic.x),
    republican_20 = as.numeric(republican.x),
    total_vote_18 = as.numeric(total_vote.y),
    democratic_18 = as.numeric(democratic.y),
    republican_18 = as.numeric(republican.y)
  ) %>%
  select(
    fips, geographic_name, total_vote_20, democratic_20, republican_20,
    total_vote_18, democratic_18, republican_18
  )

extension <- left_join(Data, county_house, by = "fips") %>%
  mutate(
    rep_vote_share_20 = (republican_20 / total_vote_20) * 100,
    rep_vote_share_18 = (republican_18 / total_vote_18) * 100,
    depvar_ext = rep_vote_share_20 - rep_vote_share_18
  )
```

```{r, warning = FALSE, message = FALSE}
# Fixed effects linear model identical to column 3 using data from the US House
model_ext <- felm(data = extension %>% drop_na(workplaces_april, s_emp_s2018), 
                  depvar_ext ~ cases_10000 + black2018 + white_nonhispanic2018 + 
                    college2018 + foreignborn2018 + female2018 + totalpop2018 + 
                    age_20_242010 + age_25_342010 + age_35_442010 + 
                    age_45_542010 + age_55_592010 + age_60_642010 + 
                    age_65_742010 + age_75_842010 + age_85over2010 + exposure + 
                    proximity + remindex + crit_work + workplaces_april + 
                    unemp_change_covid1 | istate | 0 | istate)

stargazer(model_ext, type = "text", digits = 4, title = 
            'Fixed Effects Linear Model from Column 3 of Table 2 applied to US House Elections Data')
```






